Evaluation plan for the North- and South-Dutch Benchmark Evaluation of Speech recognition Technology (N-Best 2008)
نویسندگان
چکیده
In 2005, the Flemish/Dutch programme STEVIN supported the project N-Best which has the goal to evaluate the performance of present-day large vocabulary continuous speech recognition (LVCSR) systems for the Dutch language. For a number of years, several research groups in Flanders and the Netherlands have studied various aspects of LVCSR systems for Dutch. However, sofar is is hard to compare the results of Dutch LVCSR systems because there is no common evaluation material defined. The N-Best Evaluation is the first attempt to define an evaluation task and speech database and benchmark several speech recognition systems for the performance on this task. The goal is to set up an evaluation framework so that is easier to organize future evaluations, in order to follow system progress and to possibly shift the focus of the task. This document will specify the tasks, data, performance measures, rules and time schedule of the evaluation. The document was first conceived in August 2006 and finalized in January 2007, after consultation and discussion with the Automatic Speech Recognition (ASR) partners in the N-Best project. The evaluation is set up in a way that resembles the various successful NIST evaluations of speech technology, specifically those on speech recognition. We have consulted the various evaluation plans of past NIST evaluations [2, 3], and also the evaluation plan of the French ESTER evaluation [1] held within the EVALDA evaluation campaign. The evaluation will be conducted by TNO Human Factors in the Netherlands (the evaluator), and is open to all research institutes and industries on a voluntary basis. Organizations participating in the evaluation will be named ‘ASR sites’ hereafter. The evaluation will consist of an optional ‘dry-run,’ where the development test data will be formatted in evaluation-style, as well as ASR results, such that both ASR sites and evaluator can ‘rehearse’ the actions needed for running the evaluation, and comment on and possibly alter some aspects of the evaluation. Then, the evaluation will take place by sending data to the ASR site, who will process the evaluation data and send results to the evaluator. The evaluator scores the results after which there will be an adjudication period in which ASR sites will be given the opportunity to comment on certain interpretations and decisions made by the evaluator. Finally, in a workshop the evaluator and ASR sites will present their results to the others accompanied with a paper describing their systems in more detail.
منابع مشابه
N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology
In this paper, we describe N-best 2008, the first Large Vocabulary Speech Recognition (LVCSR) benchmark evaluation held for the Dutch language. Both the accent as spoken in the Netherlands (Northern-Dutch) and in Belgium (Southern-Dutch or Flemish), will be evaluated. The evaluation tasks are broadcast news (BN) and conversational telephone speech (CTS). The N-best evaluation will take place in...
متن کاملSHoUT, the university of twente submission to the n-best 2008 speech recognition evaluation for dutch
In this paper we present our primary submission to the first Dutch and Flemish large vocabulary continuous speech recognition benchmark, N-Best. We describe our system workflow, the models we created for the four evaluation tasks and how we approached the problem of compounding that is typical for a language such as Dutch. We present the evaluation results and our post-evaluation analysis.
متن کاملResults of the n-best 2008 dutch speech recognition evaluation
In this paper we report the results of a Dutch speech recognition system evaluation held in 2008. The evaluation contained mate rial in two domains: Broadcast News (BN) and Conversational Telephone Speech (CTS) and in two main accent regions (Flem ish and Dutch). In total 7 sites submitted recognition results to the evaluation, totalling 58 different submissions in the various conditions. Bes...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملThe Albayzin 2008 Language Recognition Evaluation
The Albayzin 2008 Language Recognition Evaluation was held from May to October 2008, and their results presented and discussed among the participating teams at the 5th Biennial Workshop on Speech Technology [1], organized by the Spanish Network on Speech Technologies [2] in November 2008. In this paper, we present (for the first time) a full description of the Albayzin 2008 LRE and analyze and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007